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CHAPTER  I 


TO  WHAT  EXTENT  DO  RELIABILITY  COEFFICIENTS  OF  STANDARD  TESTS 
MEASURE  UP  TO  STANDARDS  ESTABLISHED  BY  EDUCATIONAL  THEORISTS  ? 

PURPOSE  OF  STUDY.  The  purpose  of  thi.3  study  is  to  make  a 
comparison  of  the  standards  set  by  test  theorists  for  reliability 
coefficients  and  the  data  supplied  by  test  manuals,  to  see  to  what 
extent  the  two  seem  to  agree  or  disagree.  In  a broader  sense,  it 
is  based  on  the  need  to  acquaint  test  users  with  data  pertaining  to 
the  reliability  coefficient;  what  they  should  look  for  and  where 
they  should  look  for  it. 

SOURCE  OF  PR03LEM.  "Test  scores  may  never  be  accepted  at  their 
face  value  but  must  be  always  considered  as  only  approximate  indications 
of  the  true  relative  status  of  the  individuals.  • u 

One  of  the  major  shortcomings  of  testing  programs  has  been  the 
fact  that  test  users  do  interpret  scores  at  face  value.  The  test  is 
given  in  May  to  the  sixth  grade  and  the  pupil  either  measures  a grade 
equivalent  of  6.8  or  over,  in  which  case  he  is  considered  up  to  grade 
or  he  measures  less  than  6.8  and  is  considered  in  need  of  help  to 
bring  him  up  to  those  standards.  Often  in  situations  like  this,  the 
teachers  find  the  standard  tests  too  great  a chore  for  the  benefits 
derived  from  them,  and  small  wonder. 


1.  E.  F.  Lindquist,  A First  Course  in  Statistics,  Houghton  Mifflin  Co., 
Boston,  195S. 
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Teat  theorists,  test  makers,  authors  of  texts  on  measurement, 
and  guidance  people,  as  well  as  staticians,  have  been  proclaiming 
at  least  since  the  early  20* s the  need  for  greater  skill  in  the 
interpretation  of  tests.  Much  of  this  skill  must  be  based  on  an 
understanding  of  test  construction  and  of  the  research  procedure  and 
statistics  involved. 

One  of  the  important  concepts  thus  involved  is  the  reliability 
coefficient. 

"Complete  and  detailed  information  concerning  the  reliability  of 
a test  is  a major  factor  not  only  in  enabling  the  research  worker  to 
determine  a test's  usefulness  in  dealing  with  a particular  problem, 
but  also  to  assist  him  to  interpret  results  obtained.  The  acquisition 
of  reasonably  complete  data  regarding  a test's  reliability  requires 
the  planning  and  execution  of  one  or  more  special  experiments,  the 
collection  and  analysis  of  a considerable  body  of  data,  and  the  pre- 
sentation, preferably  in  the  Manual  of  Directions  accompanying  the 
test,  of  the  findings  of  such  analysis.  » y 

JUSTIFICATION.  The  reliability  coefficient  has  been  much 

discussed,  both  praised  and  abused,  and  at  the  present  time,  it  is 

not  accepted  by  some  eminent  test  theorists  as  a satisfactory  measure 

of  reliability.  But,  it  is  the  best  known  statistical  concept  for 

expressing  the  estimate  of  reliability,  and  wide-spread  acquaintance 

and  appreciation  of  the  factors  and  values  involved  could  not  help 

but  lead  to  a more  realistic  use  of  such  coefficients  in  evaluating 

1.  R.  W.  B.  Jackson  and  G.  A.  Ferguson,  Studies  on  the  Reliability  of 
Tests,  University  of  Toronto,  Toronto,  19^-1. 
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the  quality  of  published  tests.  Test  makers,  as  well  as  users,  might 
find  it  well  worth  their  while  to  give  the  matter  of  a more  adequate 
presentation  of  reliability  data  serious  consideration. 

To  the  test  user  then,  there  still  remains  the  problem  of  be- 
coming better  acquainted  with  the  meaning,  use  and  interpretation  of  the 
reliability  coefficients.  A survey  of  what  the  test  user  should  look 
for  and  where  he  should  expect  to  find  such  information  available 
should  prove  helpful. 

SCOPS.  This  study  will  center  around  the  reliability  coefficient 
as  the  most  widely  accepted  measure  of  reliability,  the  standards 
upheld  for  reliability  coefficients  by  test  theorists,  and  the  data 
they  consider  necessary  for  its  interpretation.  The  point  of  view  will 
be  that  of  the  test  user  and  what  the  reliability  coefficient  means  to 
him. 

The  manuals  of  standard  tests  for  grades  4,  5>  and  6 will  be 
examined  to  find  the  data  which  they  provide  to  help  guide  the  test 
users  in  this  respect. 

DEFINITION.  Reliability  as  applied  to  a test  is  the  degree  of 
accuracy  or  precision  with  which  the  te3t  measures  that  which  it  does 
measure.  A yard  stick  would  not  be  a reliable  instrument  for  the 
mechanic  measuring  thousandths  of  an  inch  nor  would  the  micrometer 
be  reliable  in  measuring  yard  goods. 

In  the  application  of  measurement  to  matters  of  less  tangible 
nature  such  as  knowledge,  intelligence,  skills,  or  more  specifically 
samples  of  these,  the  question  of  reliability  becomes  much  more 
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complicated.  To  measure  the  level  of  arithmetic  achievement  of  a sixth 
grader,  it  is  hardly  possible  or  practical  to  prepare  items  covering 
every  arithmetic  fact  and  skill  which  the  student  has  acquired. 

Instead,  a number  of  representative  items,  planned  in  accordance  with 
the  objectives  of  the  arithmetic  curriculum,  are  used.  Thus  a limited 
number  of  selected  items,  grouped  to  form  a test,  become  a measure  of 
achievement  in  arithmetic.  A worthwhile  evaluation  of  the  reli- 
ability of  this  item  sampling  involves  observation,  experimentation, 
and  statistical  calculation.  The  results  of  the  findings  of  this 
process  becomes  an  estimate  of  the  reliability  of  the  test  which  is 
most  frequently  expressed  numerically  as  the  reliability  coefficient. 

As  a statistical  concept  the  reliability  coefficient  is  a 
special  application  of  the  correlation  coefficient  which  is  a 
mathematical  expression  of  the  degree  of  association  existing  between 
two  or  more  measures  of  the  same  kind. 

In  calculating  the  reliability  coefficient  of  tests,  three 
different  methods  are  used.  One  method  is  the  correlation  of  the 
scores  for  the  first  and  second  performances  of  the  same  form  of  a test 
administered  to  the  same  group,  called  the  retest  coefficient.  An- 
other is  the  correlation  of  the  scores  for  two  equivalent  forms  of  a 
test  given  to  the  same  group  and  called  interform.  The  third  is  a 
coefficient  of  internal  consistency,  and  is  obtained  by  correlation  of 
two  equivalent  halves  of  the  items  of  one  test,  corrected  with  the 
Spearman-Brown  formula.  A coefficient  of  internal  consistency  may 
also  be  obtained  by  the  use  of  the  Kuder-Richardson  formula. 
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CHAPTER  II 


HISTORY  OF  THE  RELIABILITY  COEFFICIENT 

Studies  on  the  subject  of  reliability  abound.  It  would  take  a 
long  list  to  note  even  those  more  worthwhile  works  on  reliability 
and  reliability  coefficients.  For  our  purpose,  however,  those 
studies  which  help  us  to  evaluate  reliability  data  which  we  find 
in  test  manuals  are  the  most  pertinent. 

"The  term  'reliability  coefficient'  was  introduced  by  Spear- 
man in  1904  to  denote  the  correlation  between  scores  made  on  com- 
parable forms  of  a test."  1/ 

In  the  years  1904,  1907,  and  1910,  he  published  papers  in 
which  he  made  known  his  theories  on  applying  correlation  methods 
to  psychological  data  and  tests. 

He  had  taken  his  ideas  on  correlation  from  the  work  of 
Karl  Pearson  who  had  in  turn  found  his  inspiration  from  the 
work  of  Calton,  Many  other  statistical  concepts  used  in  education 
also  are  derived  from  the  work  of  these  two  men. 

The  application  of  these  methods  to  test  data  was  made  popular 
by  the  publications  of  Spearman  and  in  all  probability  by  the 
coincidence  of  their  coming  forth  at  a time  when  Thurstone  needed 
such  a formula  for  his  work  in  mental  measurements.  The  two 
names  attached  to  the  theory  of  reliability  coefficient  gave  it 


1.  A.  Anastaai,  "influence  of  Practice  Upon  Test  Reliability" 
Journal  of  Educational  Psychology,  25  (19540  PP»  521-550* 
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considerable  prestige.  In  the  opinion  of  some,  that  may  be  one  of 
the  reasons  why  it  is  so  much  better  known  than  certain  other 
methods  of  estimating  reliability.  ^ 

According  to  T.  L.  Kelley,  Spearman  (1904,  1907)  introduced 
the  term  "reliability  coefficient"  and  used  it  to  designate  r^ , 
the  correlation  between  comparable  tests;  Brown  (1911)  used  the  term 
to  mean  r^,  the  correlation  between  repeated  tests.  & 

Spearman  and  Brown  both  developed  the  formula  for  phophecying 
the  increase  of  reliability  by  increasing  the  length  of  the  test. 

The  two  men  worked  independently  but  published  their  results 
simultaneously.  That's  why  it  is  called  the  Spearman-Brown  formula.  1/ 
Obviously  then,  these  two  men  i*ere  responsible  for  the  three 
most  prevalent  methods  of  calculating  the  reliability  coefficient 
of  tests. 

On  the  use  of  the  Spearman-Brown  formula,  we  have  Holzinger's 
Studies.  The  first,  in  which  he  compared  the  reliability  coefficient 
of  each  component  part  of  an  intelligence  test  (corrected  by  the 
Spearman-Brown  fomula  with  the  reliability  coefficient  of  the 
complete  test).  This  indicated  that  the  Spearman-Brown  formula 
over-predicted  the  reliability  of  the  parts.  A second  study, 


X.  Helen  M.  Walker,  Studies  in  the  History  of  Statistical  Methods, 
The  William  and  Wilkin  Go.,  Baltimore,  1929 

2.  T.  L.  Kelley,  Interpretation  of  Education  Measurements , 

World  Book  Co.,  New  York,  1927 
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however,  which  he  did  in  collaboration  with  Clayton  gave  a pre- 
diction with  a fair  measure  of  accuracy.  The  use  of  accurately 
calibrated  test  materials  produced  results  which  indicated  that 
the  formula  predicted  accurately. 

Others  like  Remmers,  Ruch,  Kelley,  and  Jackson  followed  with 
like  studies  also  on  using  the  Spearman-Brown  formula. 

The  results  indicated  that  the  split-half  method  corrected  by 
the  Spearman-3rown  formula  tended  to  be  higher  and  the  interform 
more  conservative.  1/ 

The  Kuder-Richardson  formula,  often  used  by  test  makers, 
dates  back  to  about  1957*  Work  with  it  shows  a tendency  to  under- 
estimate reliability.  It  is  most  accurate  when  used  with  tests 
composed  of  items  of  equal  difficulty. 

Numerous  empirical  studies  have  been  carried  out  on  the  dif- 
ferent types  of  coefficients.  Each  attempting  to  prove  which  is 
most  desirable  and  most  accurate,  as  estimates  of  test  reliability. 
It  is  only  necessary  to  recall  a few  of  them  to  give  a picture  of 
the  trend. 

Kelley  insisted  that  the  coefficient  of  correlation  of 
equivalent  forms  was  the' correct*  reliability  coefficient.  When 
only  one  form  of  the  test  was  available,  he  recommended  the  split- 
half  corrected  by  Spearman-Brown  formula  as  the  more  accurate. 

1.  R.  W.  B.  Jackson  and  G.  A.  Ferguson  Studies  on  the  Reliability 
of  Tests  Toronto  University,  Toronto,  1941,  pp.  10-11. 
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The  retest  coefficient  he  dismissed  as,  "very  misleading,  • • • 


a correlation  between  errors,  ...  that  we  shall  call  the  cor- 
relation between  repeat  tests,  a retesting  coefficient,  and  attach 
little  importance  to  it,"  2/ 

Dunlap's  reasoning,  while  showing  the  same  prefrences  is  more 
enlightening,  more  carefully  analyzed,  "The  Spearman-Brown  method 
gives  us  slightly  higher  coefficients  of  correlation  than  does  the 
correlation,  . , . between,  , , , two  forms  due  to  the  fact  that  it 
does  not  include  the  quotidian  variability,  but  does  include  the 
situation  error.  A working  approximation  of  the  test  reliability 
is  secured  by  this  method," 

"The  Spearman-Brown  formula.  • • . gives  the  reliability  of  the 
test  relatively  independent  of  the  reliability  of  the  subject.  It 
should  be  noted  that  the  true  score  of  the  Spearman-Brown  formula  is 
the  'true'  ability  at  the  instant,  while  the  'true'  score  of  the 
intercorrelation  of  two  forms  is  the  true  underlying  ability  or 
average  ability  of  the  subject.  This  last  has,  perhaps,  more  meaning 
psychologically  and  in  pedagogical  practice."  2/ 

But  Jordan  disagreed.  "Dunlap  does  not,  however,  bring  data  to 
the  support  of  his  conclusion.  ■ 5/ 

"The  coefficient  derived  from  correlating  odd-even  items  of  a 
test  most  nearly  represents  the  true  reliability  of  the  testing 


1.  T.  L.  Kelley  Op.  cit. — p59 

2.  J.  W.  Dunlap  "Comparable  Tests  and  Reliability"  Journal  of 

Educational  Psychology  Vol.  XXIV,  Sept.  1955  > pp.  442-455. 

5,  R.  C.  Jordan,  "An  Empirical  Study  of  the  Reliability  Coefficient" 
Journal  of  Educational  Psychology.  Vol.  26,  1955,  pp.  416-426 


instrument.  Pupil  variability  has  been  eliminated."  1/ 

Thorndike,  in  his  report  on  research  in  the  service  during  the 
war,  presents  some  interesting  differentiations  of  the  types  of  cor- 
relation and  their  applicability  to  different  needs.  He  speaks  from 
practical  application  of  these  principles  and  considers  each  type 
of  coefficient  applicable,  for  certain  purpose,  to  fulfill  certain 
needs.  2/ 

Cronbach's  study  is  in  the  same  vein.  He  maintains,  "No  one 
best  estimate  of  reliability  exists."  Each  represents  a different 
type  of  estimate  of  reliability,  each  has  different  values,  and  each 
needs  to  be  interpreted  according  its  own  use  and  purpose.  2/ 

It  seems  logical  then  to  conclude  that  the  trend  is  toward  a 
more  practical,  more  realistic  approach.  There  is  more  and  more  of 
a tendency  to  accept  the  philosophy  professed  by  Slocombe  in  1928. 

"Until  methods  are  seriously  applied  by  psychologists  concerned  with 
results,  and  their  value  proven  thereby,  they  must  be  regarded  as 
just  exceedingly  interesting  possibilities."  ii / 

Kelley's  work  has  attached  a set  of  numerical  standards  of 
acceptability  to  the  reliability  coefficient  based  on  careful 
mathematical  analysis.  These  are  often  quoted.  However,  people 

1.  R.  C.  Jordan  "Empirical  Study  of  the  Reliability  Coefficient" 
Journal  of  Education  Psychology.  26  (1955)  PP*  4-16-426. 

2.  R.  L.  Thorndike  Research  Problems  and  Techniques  Report  No.  5 
Army  Air  Porce-Aviation  Psychological  Program  Research  Reports,  1947 

5.  Cronbach,  Lee  J.  "Test  Reliability  Its  meaning  and  Determination" 
Psychometrika  XII  March  1947  pp  1-16 
4.  0,  S.  Slocombe  "Truman  L.  Kelley  Measures  Mental  Traits"  Journal 

of  Education  Psychology  Vol.  19  (1928)  pp.  479-501* 
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who  have  worked  with  teats,  users  and  maker3  alike,  tend  to  con- 
sider his  "minimum"  requirements  to  be  actually  "optimum"  require- 
ments; but  they  continue  nevertheless  to  be  very  meaningful  to 
people  working  in  the  field  of  educational  measurement. 

Douglass  and  Cozens  (1929),  and  Thomson  (1957)  did  studies  on 
reliability  coefficients  for  te3t  batteries.  These  point  out  the 
greater  accuracy  of  the  reliability  coefficient  of  individual  tests 
in  the  battery  and  the  need  of  knowing  these  as  well  as  that  for 
the  whole  battery.  1/ 

Symonds8  study  on  the  variety  of  causes  affecting  the  re- 
liability of  tests  was  important  and  was  followed  by  a series  of 
other  studies  on  the  various  factors  his  study  mentioned. 

Symonds  and  Thurstone  both  did  studies  to  prove  that  tests 
made  up  of  items  of  .5  difficulty  value  measure  most  accurately 
and  therefore  have  the  highest  reliability  coefficients.  2/ 

The  problem  is  far  from  settled.  And  studies  by  statisticians, 
test-makers,  and  test  users  which  will  clarify  the  issue  as  it  stands 
and  invite  wider  use  on  the  part  of  test  users  as  well  as  possible 
improvement  on  the  part  of  all  concerned,  are  more  than  justified. 


1.  R.  W.  B,  Jackson  and  G,  A.  Ferguson,  Op.  cit.  pp.  12-l4 

2.  P.  M.  Symonds  "Factors  Influencing  Test  Reliability,"  Journal 

of  Education  Psychology . 19:  75**S7  (1928). 


CHAPTER  III 


SURVEY  OF  STANDARDS  SET  BY  TEST  THEORISTS  AND  AUTHORS  OF  TEXTS 

AND 

DATA  GIVEN  IN  THE  MANUALS  OF  STANDARD  TESTS 
SURVEY  OF  OPINIONS 

A survey  of  the  opinions  of  a number  of  test  theorists  and  authors 
of  texts  on  tests  and  measurements  has  yielded  the  following  standards 
as  those  which  a test  user  should  find  most  helpful  in  interpreting 
the  value  of  a given  reliability  coefficient  in  the  selection  and 
evaluation  of  tests,  and  in  the  interpretation  of  the  results  of  tests. 

LENGTH  OF  TEST.  It  is  generally  accepted  that,  all  other 
things  being  equal,  the  test  with  a greater  number  of  items  and  the 
test  taking  the  longer  time  to  do,  is  the  more  reliable. 

"'The  most  important  single  factor  influencing  test  reliability 

* 

is  the  number  of  items.  The  greater  the  number  of  items  in  a test, 
the  more  reliable  the  test."  And  then  again,  "The  longer  time  a 
test  occupies  the  greater  its  reliability."  1/ 

"A  two-  or  three-  hour  examination  is  needed  to  determine, 
approximately,  individual  fitness  for  college  work,  but  a carefully 
devised  five  minute  examination  given  to  all  entering  students  of 
two  universities  would  easily  enable  one  to  tell  which  of  the 
universities  drew  the  more  capable  students." 


1.  P.  H.  Symonds,  Op.  cit. 

2.  T.  L.  Kelley,  Interp.— pp. 


Since  group  differentiantion  according  to  Kelley' 3 own  standards 


only  calls  for  a reliability  coefficient  of  .50,  the  group  test 
could  be  much  shorter,  all  other  things  being  equal,  than  the  teat 
for  individual  diagnosis  with  standards  which  he  sets  at  .94  and  .98* 
This  does  not  mean  that  a test  that  is  twice  as  long  will  be 
twice  as  reliable ♦ "DeMoivre  (1755)  established  the  fact  that  ac- 

versely  as  the  square  root  of  the  size  of  the  sample: 


This  is  the  reason  that  reliability  coefficients  found  by  the 
split-half  method  are  corrected  by  the  Spearman-Brown  formula  which 
predicts  the  reliability  for  the  whole  test  from  the  already  obtained 
reliability  coefficient  of  the  correlated  halves. 

TYPE  OF  CORRELATION.  Three  types  of  reliability  coefficients 
are  in  general  use. 

An  almost  countless  number  of  studies  have  been  done,  each 
extolling  one  type  of  coefficient  as  superior  to  tne  others.  But 
more  recent  studies  tend  to  accent  the  fact  "that  different  methods 
of  computing  'reliability1 2  give  different  results,  that  the  range 
of  applicability  of  each  is  limited,  and  that  the  choice  of  method 
depends  on  the  purpose  and  conditions  of  the  investigation."  ZJ 

"No  one  of  these  is  the  right  coefficient.  They  measure  dif- 
ferent things  and  each  is  useful.  What  is  important  is  to  avoid 

1.  T.  L.  Kelley  Op.  cit. — 

2.  H.  H.  Remmers , and  Laurence  Whistler  "Test  Reliability  as  a 


Function  of  the  Method  of  Computation"  Journal  of  Educational 
Psychology.  29  (1958)  pp.  81-92. 
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confusing  one  with  another,  and  using  one  as  an  estimate  of  another.  - y 
Oronbach  then  explains  his  theory  that  the  retest  coefficient 
give  a coefficient  of  stability,  the  internal  consistency,  a coef- 
ficient of  equivalence,  and  the  interform,  a coefficient  of  stability 
and  equivalence,  and  the  hypothetical  self-correlation,  which  is  the 
coefficient  obtained  by  the  Kuder-Richardson  formulas*  2/ 

Thorndike , in  reference  to  experiments  done  with  psychomotor 
tests,  involving  progressive  learning,  where  the  emphasis  is  on  skill  of 
performance  after  a considerable  period  of  time  found  the  retest 
reliability  coefficient  the  most  desirable.  In  evaluating  a test  score 
in  relation  to  other  tests,  correlation  of  the  first  half  to  the  second 
serves  the  purpose.  He  also  finds  this  comparable . to  immediate  retest. 
This  is  not  so  in  the  case  of  speed  tests.  In  tests  where  the 
element  of  speed  is  important,  the  test  "should  be  constructed  in  two 
equivalent  parts  which  may  be  separately  timed."  3/ 

Underlying  the  use  of  these  coefficients  are  certain  assumptions 
which  must  be  fulfilled  in  order  to  make  the  results  meaningful  and 
accurate.  The  fulfilling  of  these  assumptions  is  the  test  maker’s 
problem  but  it  would  be  helpful  to  the  test  user  to  know  in  what 
manner  this  was  done. 

POPULATION.  Reliability  coefficients  should  be  determined  on  a 
group  who  range  of  achievement  is  similar  to  that  of  the  group  whom  the 
test  will  be  used  to  discriminate  from  one  another. 

1.  H.  H.  Rammers  and  Laurence  Whistler  Op.  cit.  pp.  81-92 

2.  L.  J.  Cronbach  Op.  cit.  pp.  1-16 

5.  R.  L.  Thorndick  Op.  cit.  pp. 


I 

\f  ' ' 

. 

. 


. 


"Each  reliability  coefficient  must  be  accompanied  by  a description 
of  the  group  upon  which  it  is  based  to  be  meaningfully  interpreted." 

"The  greater  variability  in  a group  of  pupils,  the  higher  the 
reliability  coefficients.  Consequently  the  reliability  coefficients 
of  a test  given  to  several  grades  is  higher  than  that  of  the  same  test 
given  to  a single  grade  since  the  range  of  achievement  is  larger  in  the 
former  case." 

"Reliability  of  tests  designed  to  reveal  difference  of  achievement 
in  a single  classroom  should  be  determined  upon  a group  of  pupils 
within  a similarly  restricted  range  of  achievement.  Reliability 
determined  on  pupils  ranging  among  several  classrooms  or  different 
geographic  areas  or  differing  in  certain  other  factors  affecting 
achievement  will  be  spuriously  high  and  give  a false  picture  of  the 
relia  ility  for  use  in  the  single  classroom.  For  this  reason  it  is 
well  for  intelligence  tests  to  give  reliability  coefficients  for  the 
range  of  chronological  age."  1/ 

The  number  of  pupils  on  which  the  correlations  were  based  and 
random  sampling  also  affect  the  size  of  the  coefficient  of  reliability. 

"it  follows  that  we  should  specify  the  population  to  which  the 
estimate  refers.  ...  It  will  be  realized  that  the  number  of  groups 
possible  is  very  large,  and  unless  the  author  gives  us  some  idea  of 
what  group  or  groups  were  sampled,  the  value  he  quotes  will,  at  least 
to  a certain  extent,  be  meaningless."  £/ 

1.  H.  H.  Remmers  and  N.  L.  Gage,  Education  Measurement  and  Education 
Harper  and  Brothers,  New  York,  1945 

2.  R.  W.  B.  Jackson  and  G.  A.  Ferguson,  Op.  cit.  
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TYPE  OF  TEST 


• "The  type  of  teat  or  nature  of  what  i.3  being 
measured  affects  reliability  coefficients*  Ability  to  recall  is  easier 
to  measure  than  true  understanding,  appreciation,  and  interpretation.  . * 
Tests  of  factual  information  naturally  have  higher  reliability  coef- 
ficients than  tests  meastiring  less  tangible  objectives  of  teaching."  1/ 

"The  reliability  coefficient  calculated  from  mental  age  and  intel- 
ligence quotient  scores  will  not  necessarily  be  the  same  and  we  cannot 
speak  simply  of  ’The*  reliability  of  a test.  Which  should  be  given? 

Some  workers  may  wish  to  use  the  mental  age  and  others  the  intelligence 
quotient  scores.  To  insure  general  satisfaction,  all  the  coefficients 
should  be  given  in  order  that  a \irorker  may  use  the  value  appropriate  to 
his  particular  problem."  £/ 

NUMERICAL  STANDARDS.  It  might  be  well  at  this  point  to  call  to 
mind  a statement  from  Dr.  Walker’s  book  on  statistics,  "We  shall  have 
to  consider  many  aspects  of  a correlation  in  order  to  build  up  some 
feeling  about  the  import  of  the  size  of  a coefficient  under  particular 
circumstances. " 

This  statement  of  Dr.  Walker’s  is  really  another  way  of  stating  the 
underlying  purpose  of  this  study  and  of  innumerable  other  studies  on  the 
reliability  coefficient.  It  is  not  a definite  numerical  value  which  can 
stand  alone  but  an  estimate  which  must  be  weighed  and  interpreted  in 
terms  of  the  many  factors  involved  in  te3t  construction,  evaluation  and 
interpretation. 

1.  H.  M.  Walker,  Elementary  Statistical  Methods,  Henry  Holty  & Co., 

New  York,  1945,  p.  246 

2.  R.  W.  B.  Jackson  and  G-.  A.  Ferguson,  Op.  cit. [ 
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Only  after  a given  reliability  coefficient  has  been  considered  in 
the  light  of  these  factors  should  conslusions  be  drawn  as  to  its  being 
'high"  and  'satisfactory1  or  'low'  and  'unsatisfactory'. 

Inspite  of  this,  certain  theorists  have  set  up  numerical  standards 
for  reliability  coefficients  in  the  field  of  measure  ment  which,  even 
though  they  are  not  mandatory  to  test  makers  and  users,  have  become 
useful  as  guide-posts  to  both. 

The  work  of  Truman  L*  Kelley  is  the  most  outstanding  on  this  issue. 
For  the  minimum  reliability  of  a test  to  be  used  in  a single  school 
grade,  he  sets  .98  as  txxe  reliability  coefficient  necessary  in  "teats 
for  the  measurement  of  differences  in  the  individual  abilities  and 
accomplishments  in  several  scholastic  lines  and  an  estimate  of  the 
probability  of  presistence  of  differences,  of  the  sort  revealed,  in 
future  school  work  or  vocation." 

He  sets  .9^  "for  the  measurement  of  the  past  general  scholastic 
success  and  the  future  promise  of  an  individual  in  a specific  school 
subject." 

His  next  step  down  is  .90,  "for  the  measurement  of  relative 
differences  in  achievement  of  the  group  in  two  or  more  scholastic  lines 
and  an  estimate  of  the  significance  of  such  differences." 

He  then  goes  down  to  .50  "for  measurement  of  the  general  group 
(Grade  or  school)  accomplishments  and  an  estimate  of  the  probable  future 
of  general  group  success  in  school  work."  This  he  at  first  considered 
the  lowest  limit  of  acceptability  for  the  reliability  of  tests  in  a 
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single  school  grade.  1/ 


Later,  however,  in  reference  to  a specific  test,  he  accepted  .45  as 
having  predictive  value  for  group  success.  ^ 

Concerning  the  criteria  of  .94  he  wrote,  "This  is.  . . .a  reli- 
ability coefficient  as  found  from  a one-grade  range  of  talent,  and  since 
it  is  a rather  high  coefficient,  it  is  obvious  that  relatively  few  of 
our  intelligence  and  achievement  tests  meet  his  standard  of  reliability. 
We  are  forced  to  conclude  that  if  they  do  not,  they  are  of  doubtful 
value  in  connection  with  the  more  important  problems  involving  individ- 
ual classification."  It  must  also  be  assumed  that  he  is  speaking  of 
the  reliability  coefficient  as  obtained  from  interform  correlation, 
since  he  draws  his  conclusions  from  calculations  ending  with,  "We  obtain 
rll  = *u  meaning  the  correlation  of  equivalent  forms.-i/ 

"With  such  extremely  high  reliability  coefficients  as  r = .89, 
gaps  still  remain  between  obtained  and  true  scores,  it  is  obvious  that 
reliabilities  below. 90  are  of  little  value  if  we  require  accurate 
determination  of  individual  scores.  When  r = .98,  the  chances  are  still 
68  to  100  that  a given  obtained  score  will  diverge  from  its  true 
counterpart  by  as  much  as  ,l4l4  times  the  standard  deviation  of  the 
test." 

1.  T.  L.  Kelley,  Interpretation  of  Education  Measurements  World  Book  Co. 
Yonkers-on  Hudson,  New  York,  1927,  pp.  210-211. 

2.  T.  L.  Kelley,  Tests  and  Measurement  in  the  Social  Studies,  Chas. 
Scribner's  Sons,'  Boston,  1934-p.  500 

5.  H.  E.  Garrett,  Statistics  in  Psychology  and  Education,  (Rev.  ed.) 
Longman,  Green  & Co.,  New  York,  1957* 


OTHER  STATISTICS.  In  evaluating  a given  reliability  coefficient 
for  the  pufpose  he  has  in  mind,  the  test  user  may  use  a number  of  other 


statistical  concepts  and  test  theorists  agree  that  such  information 
should  be  included  in  the  reliability  information  given  in  the  test 
manual. 

IJean  and  Standard  Deviation.  The  range  of  talent  involved  in 
determining  the  reliability  coefficient  is  best  stated  in  terms  of  the 
standard  deviation  of  test  scores.  1/ 

"The  correlation  between  two  forms  of  the  same  test  is  always 
increased  by  an  increase  in  variability  of  the  scores.  So  striking  is 
this  phenomenon,  that  whenever  the  reliability  of  a test  is  published, 
it  is  considered  essential  to  state  for  what  group  it  was  computed, 
and  what  was  the  standard  deviation  of  that  group." 

The  standard  deviation  is  also  considered  important  as  a "device 
for  locating  the  extent  to  which  an  individual  departs  from  the  group 
average  on  any  scale  of  measurement.  In  this  regard,  it  is  interesting 
to  notice  that  the  average  is  the  point  from  which  the  individual  is 
judged  to  vary  in  a high  or  low  direction.  Measurements  are  interesting 
in  so  far  as  they  tell  whether  an  individual  is  significantly  above 
or  below  average,  or  is  in  the  average  range.  We  can  also  use  the  S.  D. 
in  comparing  the  variability  of  scores  made  by  different  groups  on 
the  same  test."  2/ 


1.  G.  M.  Ruch  "Minimum  Essentials  in  Reporting  Test  Data  on  Standard 
Tests,"  Jour,  of  Ed.  Research,  XII  (1925)  pp.  5^9-55^ 

2.  H.  M.  Walker.  Elementary  Statistical  Methods , Henry  Holt  &0o.  K.  Y. 
1945,  p.  258. 

5.  J.  G.  Darley  Testing;  and  Counseling  in  the  High  School  Guidance  Pro- 
gram.  Science  Research  Associates,  Chicago,  1945 


In  judging  the  standing  of  an  individual  or  group  in  this  manner 


it  becomes  essential  to  know  the  mean  as  the  "value  around  which 
fluctuation  is  measured*"  1/ 

Probable  Error  and  Standard  Error*  Probable  error  and  standard 
error  have  the  same  interpretation  for  the  test  user.  The  formula 
P.  E,  5 E makes  them  interchangeable.  Either  may  be  given  for 

use  in  interpreting  individual  scores. 

"One  method  of  expressing  the  reliability  of  scores  on  a teat 
which  has  the  advantage  of  being  independent  of  the  range  of  talent 
used  in  determining  the  reliability  is  the  standard  error  of  a true 
score.  This  statistic  tells  the  range  within  which  scores  on  the 
same  test  would  be  expected  to  fall  2/5  of  the  time  if  a very  large 
number  of  the  tests,  equivalent  in  all  respects,  were  given  the  pupil. 
Thus,  if  a pupil's  true  score  on  a test  is  70  and  the  standard  error 
of  this  score  is  four,  then  his  true  score  would  be  66  to  74,  2/5  of  the 
time  if  additional  forms  of  the  same  test  were  given  him.  » y 

Similar  references  are  found  in  regards  the  use  of  the  probable 
error. 

"The  probable  error  of  a grade  equivalent  or  the  probable  error  of 
an  age  equivalent  gives  an  estimate  of  the  limits  within  which  the 
child's  score  is  likely  to  fluctuate  on  a retest  due  to  sampling  or 
chance  errors." 


1.  H.  M.  Walker,  Op.  cit.  p.  25. 

2.  H.  H.  Remmers  and  N.  L.  Gage,  Op.  cit. 

5.  Manual  for  "Durrell-Sullivan  Reading  Capacity  and  Achievement  Tests, 

World  Book  Co.,  1945. 


There  are  a number  of  standard  tests  on  the  market  which  give  no 
reliability  coefficients  in  their  manuals , 

"The  reasoning  is  that  those  are  best  and  most  reliable  which 
give  the  highest  coefficient  of  correlation  of  this  sort"  (re-test 
and  split-half  correlation),  "We  believe  that  the  correctness  of  this 
conclusion  depends  on  the  tests  and  circumstances  and  that  as  a matter 
of  fact  it  has  more  often  been  wrong  than  correct,"  1/ 

The  reliability  is  expressed  in  terms  of  the  probable  error  of 
the  C-score  unit  because  this  is  more  definite  and  concrete  than  the 
coefficient  of  reliability."  ^ 

There  are  other  test  theorists  and  test  makers  who  believe  in 
other  ways  of  expressing  reliability  than  through  the  use  of  the 
reliability  coefficient.  However,  this  phase  of  the  subject  is  not 
really  within  the  scope  of  this  study. 

It  is  considered  of  value,  however,  to  note  what  proportion  of 
the  tests  examined  do  not  give  data  concerning  the  reliability  coef- 
ficient. Therefore,  such  tests  will  be  listed  under  ’no  data'  even 
though  other  data  concerning  the  tests  reliability  is  available. 


1.  F.  Kuhlmann  and  R.  Anderson,  Kuhlmann-Anderson  Test  Manual -Education 
al  Test  Bureau,  Philadelphia,  1927-47,  p.  9 

2,  Bruechner,  Anderson,  Van  Wagenen — Unit  Scales  of  Attainment  Teat 
Manual . Educational  Test  Bureau,  Philadelphia,  1953-59 » p,  5 
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TEST  MANUALS  EXAMINED  FOR  DATA 


Clapp-Young  Self  Marking  Teats,  Clapp-Young  Arithmetic  Teat  Manual 
gives  a reliability  coefficient  for  the  grade  range  of  5 to  8 and  the 

P.  E. 

Los  Angeles  Diagnostic  Test.  Los  Angeles  Diagnostic  Test  Manual 
gives  no  data  on  reliability  coefficients. 

Woody-McCall  Arithmetic  Fundamentals.  Woody-McCall  Teat  of  Arith- 
metic Fundamentals  Manual  also  gives  no  data. 

Lewerenz  Test  in  Fundamental  Abilities  of  Visual  Art,  Lewerenz 
Art  Test  Manual  gives  a retest  coefficient  for  the  grade  range  ^-12 
and  the  P.  E. 

Iowa  Every  Pupil  Test  of  Basic  Skills.  Iowa  Test  of  3a3ic  Skills 
Manual  gives  no  data. 

Metropolitan  Achievement  Test.  Metropolitan  Achievement  Test 
Manual  gives  the  reliability  coefficient  for  split-half  correlation, 
corrected  by  Spearman-Brown  Formula,  for  grade  5 for  each  subtest.  It 
also  gives  the  number  of  pupils  the  correlations  were  based  on;  the 
mean  raw  score,  the  S.  D.  in  terms  of  standard  scores  and  the  S.  E.  of 
measures  are  also  given  for  each  subtest. 

Progressive  Achievement  Test.  Progressive  Manual  gives  interform 
coefficients  for  the  grade  range  (4-6)  for  each  subtest  and  Kuder- 
Richardson  coefficients  for  each  subtest  at  each  grade  level;  also  K.  R. 
coefficients  for  the  total  test  and  the  total  range  (4-6). 

Stanford  Achievement  Battery.  Stanford  Achievement  Manual  gives 
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the  coefficients  obtained  by  split-half  correlation,  corrected  with  the 
Spearman-Brown  Formula  and  the  S,  D.  and  P.  E.  for  each  subtest  at  the 
fifth  grade  level.  The  number  of  pupils  used  for  the  correlation  is 
also  given. 

Unit  Scales  of  Attainment  Battery,  Unit  Scales  of  Attainment 
Manual  gives  no  data, 

Gates-Strang  Health  Knowledge  Test,  Gates-Strang  Health  Test 
Manual  gives  coefficients  for  each  form  obtained  by  the  split-half 
method  corrected  with  the  Spearman-Brown  formula.  It  also  gives  inter- 
form coefficients  based  on  811  cases,  grade  range  5-8,  forms  A,  B,  C. 

Engle -Stenqui at  Home  Economics  Test,  The  Engle  Stenquist  Test 
Manual  gives  the  coefficients  for  grades  5 and  6 and  the  approximate 
P,  E.,  for  each  test  (Cookery  and  Clothing), 

The  Clapp-Young  English  Test,  Clapp-Young  English  Test  Manual 
gives  a coefficient  and  P,  E.  for  the  two  grades  combined  and  the 
median  score  for  each  grade, 

Iowa  Language  Abilities  Test,  Iowa  Language  Abilities  Test  Manual 
gives  split-half  coefficients  for  each  form,  A,  B,  C,  Am,  Bm,  Cm,  of 
each  subtest  at  the  grade  five  level  with  the  S.  E,  for  each  and  the 
number  of  pupils  the  correlations  were  based  on, 

Leonard  Diagnostic  Test  in  Punctuation  and  Capitalization,  The 
Leonard  Diagnostic  Test  Manual  gives  a coefficient  of  correlation  be- 
tween forms  A and  B for  grade  5 and  grade  6,  the  S.  D,  for  each  and  the 
number  of  pupils  on  which  the  coefficients  were  based. 
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Loa  Angeles  Diagnostic  Test*  The  Los  Angeles  Diagnostic  Language 


Usage  Test  Manual  gives  no  data. 

Language  Essentials  Test,  The  Manual  for  the  Language  Essentials 
Test  by  Schrammel  and  Davis  gives  interform  coefficients  for  each  form 
of  the  whole  test  at  each  grade  level,  with  median  scores  and  S,  D,, 
and  P,  E,,  and  the  interform  coefficient  for  each  subtest  for  the  grade 
range  (4-8)  with  P,  E, 

The  California  Test  of  Mental  Maturity,  California  Test  of  Mental 
Maturity  Manual  gives  split-half  coefficients,  corrected  by  the  Spearman- 
Brown  formula,  for  each  subtest  at  each  grade  level  with  the  S,  D,  and 
maximum  P,  E,  given  in  terms  of  M.  A,  The  coefficients  of  this  test 
were  calculated  in  terms  of  M.  A, 

Haggerty  Intelligence  Examination,  Haggerty  Intelligence 
Examination  Manual  gives  a retest  coefficient  for  the  total  test  and 
the  range  of  coefficient  for  each  subtest  based  on  the  total  grade  range, 
Henmon-Nelson  Tests  of  Mental  Ability,  The  Henmon-Nelson  Test 
Manual  gives  the  reliability  coefficient  for  each  grade  level  and  for 
each  year  of  chronological  age.  It  also  gives  S,  D,  and  P,  E,  of  raw 
score  for  each  level. 

Kuhlmann-Anderson  Intelligence  Teat,  Kuhlmann-Anderson  Manual 
gives  no  data  concerning  the  reliability  coefficient, 

Multi-Mental  Scale.  Multi-Mental  Scale  Manual  gives  no  data  con- 
cerning the  reliability  coefficient,. 


National  Intelligence  Testa,  National  Intelligence  Test  Manual 


gives  no  data. 

Otis  Quick  Scoring  Test  of  Mental  Ability.  Otis  Quick  Scoring 
Test  Manual  gives  two  interform  coefficients  for  each  grade;  one  with 
form  A given  first  and  one  with  form  B given  first.  P.  E.  is  also 
given  with  each  coefficient. 

A split-half  coefficient,  corrected  with  the  Spearman-Brown  for- 
mula is  given  for  form  0,  for  each  grade. 

The  average  number  of  pupils  on  whom  the  findings  are  baaed  is 

given* 

Pintner  General  Abilities  Test.  Pintner  General  Abilities  Test 
Manual  gives  the  Split-half  coefficient,  corrected  with  the  Spearman- 
Brown  formula  for  each  form  for  the  total  grade  range  (4-9).  S.  D. 
and  P.  E.  are  also  given.  The  correlations  are  based  on  random  samp- 
ling of  pupils  and  the  number  of  pupils  for  each  group  is  given.  The 
range  of  chronological  age  of  the  pupils  tested  is  also  given. 

Chapman-Cook  Speed  of  Reading  Test.  The  Chapman-Cook  Reading 
Test  Manual  gives  no  data. 

The  Diagnostic  Examination  o:* 1  Silenl  Reading  Abilities.  The 
Diagnostic  Reading  Test  Manual  gives  no  data. 

Durrell  and  Sullivan  Reading  Achievement  and  Capacity  Tests.  Durrell 
and  Sullivan  Test  Manual  gives  a split-half  coefficient,  corrected  with 
the  Spearman-Brown  formula,  for  each  subtest  at  each  grade  level,  also 
for  the  grade  range  and  for  the  whole  test,  as  is  the  P.  E.  of  the  score. 
The  coefficients  are  based  on  random  sampling.  The  number  of  pupils  in 
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each  group  is  given* 

Gates  Basic  Reading  Teat*  Gates  Basic  Reading  Test  Manual  gives 
a retest  coefficient  for  two  grade  four  and  grade  six  groups,  one  being 
a larger  group  than  the  other.  The  coefficients  are  based  on  random  samp 
ling  and  the  number  of  pupils  in  each  group  is  given. 

Haggerty  Reading  Sxamination.  Haggerty  Reading  Examination 
Manual  gives  a retest  coefficient  for  each  test  and  for  the  total  test 
based  on  126  pupils  ranging  from  grade  5 0 to  grade  8 A. 

Ingraham-Clark  Diagnostic  Reading  Test.  The  Ingraham-Olark 
Reading  Test  Manual  gives  a split-half  coefficient,  corrected  with  the 
Spearman-Brown  formula,  for  each  grade  and  for  the  grade  range  (4-8). 

Iowa  Silent  Reading  Test.  The  Iowa  Reading  Test  Manual  gives  a 
split-half  coefficient,  corrected  with  the  Spearman-Brown  formula,  for 
each  subtest  for  grade  6.  It  gives  retest  coefficients  for  each  subtest 
at  each  grade  level  and  for  the  total  test.  The  S.  D.  and  P.  E.  are 
given  with  each  coefficient. 

Los  Angeles  Reading  Test.  The  Los  Angeles  Reading  Test  Manual 
gives  an  interform  coefficient  for  its  four  forms  paired  in  every  pos- 
sible combination.  The  population  consisted  of  420  pupils  ranging 
from  grade  5 to  8* 

Nelson  Silent  Reading  Test.  Nelson  Silent  Reading  Test  Manual 
give 8 no  data. 

Sentence  Vocabulary  Test.  The  Sectence  Vocabulary  Test  Manual  also 
gives  no  data. 


Sangren-Woody  Reading  Teat.  Sangren-Woody  Test  Manual  gives  a 
coefficient  for  each  aubteat  at  6 B grade  level  and  the  P,  E. 

Burton  Civics  Teat.  The  Burton  Civics  Teat  Manual  gives  inter- 
form  coefficients  for  its  two  forms;  one  with  form  A,  given  first,  the 
other  with  form  B administered  first.  Tests  were  given  to  500  pupils, 
Wiedfeld-Walther  geography  Test.  Wiedfeld-Walther  Geography  Test 
Manual  gives  an  interform  coefficient  for  the  total  test  and  each  of  its 
two  forms  at  each  grade  level  and  an  approximate  P,  E,  in  terms  of 
score  points  and  grade  equivalents. 
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Summary  of  Data,  The  Manuals  for  57  teats  were  examined.  Twelve 


of  these  give  no  data  concerning  the  reliability  coefficient.  Con- 
sequently, the  rest  of  the  data  must  be  measured  in  the  light  of  the 
remaining  25  test 3. 

The  time  for  giving  the  test  and  the  number  of  items  usually 
were  found  by  looking  through  the  directions  for  giving  the  test  or 
by  examining  the  test  itself. 

Ten  tests  used  split-half  correlations  with  only  9 stating 
specifically  that  the  coefficients  were  corrected  with  the  Spearman- 
Brown  Formula.  Two  tests  report  a Kuder-Richardson  coefficient  but 
in  both  cases  another  type  of  coefficient  is  also  given.  Four  tests 
report  retest  coefficients  and  nineteen  give  interforms.  Adding 
split-half,  Kuder-Richardson,  retest  and  interform,  gives  25  but 
this  includes  5 tests  which  give  two  types  of  coefficient  and  5 
which  do  not  specify  how  the  coefficient  was  obtained,  simply  referring 
tc  it  as  a coefficient  of  correlation  or  a reliability  coefficient. 

Nineteen  of  the  25  manuals  give  the  number  of  pupils  on  which  the 
correlations  were  based. 

Of  the  25  tests  now  being  considered  5 do  not  include  grade 
four  in  their  grade  range,  12  others  give  coefficients  for  the 
fourth  grade  level.  There  is  only  1 of  these  tests  which  does  not 
include  fifth  grade  in  its  range  and  17  of  them  give  coefficients  for 
the  fifth  grade  level.  Fifteen  of  the  manuals  give  coefficients 
for  the  sixth  grade  and  18  give  coefficients  for  the  total  range  of 
grades  for  which  the  test  was  prepared. 
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Some  manuals  give  coefficients  for  only  one  grade,  others  give 
them  for  more  than  one  grade,  while  still  others  give  then  for  both 
range  and  individual  grade. 

The  age  range  of  the  group  on  which  the  correlations  were  based 
are  given  in  2 manuals. 

Five  manuals  state  that  random  sampling  technique  was  used. 

One  intelligence  test  gives  the  coefficient  based  on  mental  age. 
The  others  are  based  on  test  scores.  None  of  them  give  coefficients 
based  on  intelligence  quotients. 

Tests  of  factual  knowledge  and  skill  are  not  always  esily  dis- 
cernable  one  from  the  other,  and  few  test  manuals  explain  which  of 
these  the  test  i3  trying  to  measure. 

One  manual  test  gave  the  numerical  values  of  item  difficulty  of 
the  items  in  its  test. 

Of  the  coefficients  given  2 are  .98;  57  are  between  .9^  and  .98; 
65  are  between. 9k  and  .90;  70  are  between  .85  and  .90;  5 6 are  between 
.00  and  .85#  17  are  between  .75  and  *80;  and  25  are  less  than  .75* 

Three  tests  give  the  means,  8 give  S.  D.,  2 give  S.  E.  and  16 
give  P.  E. 
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CHAPTER  IV 


SUMMARY 

Again  it  becomes  essential  to  bear  in  mind  that  the  reliability 
coefficient  can  never  be  treated  as  a set  and  definite  numerical 
value.  It  is  only  an  estimate  of  reliability  and  as  such  it  must 
always  be  weighed  in  terms  of  other  factors. 

It  is  obvious  that  many  of  the  standards  set  by  theorists,  when 
put  to  the  test  of  practical  application,  fail  to  hold  their  own. 

The  test  user  should  in  the  long  run  be  the  best  judge. 

And  standard  tests,  like  many  other  products  for  sale  on  the 
market,  will  probably  come  of  ns  high  quality  as  the  consumer  insists 
upon  and  quality  often  follows  upon  the  consumers  insistence  that  the 
product  be  properly  labeled. 

In  the  matter  of  reliability,  it  should  not  be  too  much  to  ask 
that  every  manual  describe  each  step  taken  to  insure  the  reliability 
of  the  instrument. 

The  purpose  for  selecting  one  method  or  the  other  should  also  be 
stated  to  help  the  test  user  judge  the  techniques  of  the  test  maker 
and  decide  whether  they  are  appropriate  to  the  specific  test. 

If  a test  maker  puts  a test  on  the  market  with  a reliability 
coefficient  of  .75  or  less,  surely  he  should  explain  why  he  considers 
such  a test  worthy  of  being  used. 

The  other  statistical  concepts  used  in  evaluating  a reliability 
coefficient  surely  should  be  found  with  all  tests,  since  they  are  also 
vital  in  interpreting  test  results. 


As  part  of  the  description  of  the  test  it  would  be  most  helpful 
if  all  the  information  related  to  the  factors  mentioned  on  the  data 
tables  of  this  study  could  be  assembled  in  table  form  to  give  a 
concise  summary  of  how  the  test  measures  up  to  acceptable  standards. 


CHAPTER  V 


CONCLUSION 

CONCLUSION.  Theorists  not  working  with  tests  are  apt  to  set  too 
high  standards  for  reliability  coefficients  for  measurement  of  dif- 
ferences among  individuals. 

Guidance  people  and  other  workers  using  tests  tend  to  take  a 
more  practical  point  of  view. 

Data  given  in  manuals  is  insufficient  and  should  be  more 
standardized.  It  would  seem  that  test  users  would  also  have  to  be 
staticians  of  the  first  order  to  be  able  to  interpret  a comparison 
of  one  set  of  data  to  the  other. 

Test  makers  should  give  a description  of  the  construction  of 
the  test  with  explanations  for  the  techniques  applied. 

Short  resumes  of  all  the  most  pertinent  data  (preferably  in 
table  form)  should  be  included  in  the  manual  for  convenience  of 


test  users 


,-c- 


CHAPTER  VI 


SUGGESTIONS  FOR  FURTHER  STUDY 

1.  Study  of  other  ways  of  expressing  the  estimate  of  reliability 
to  see  if  some  might  be  more  accurate  and  less  complicated. 

2.  Follow  up  of  results  of  individual  diagnosis  to  establish 
whether  statisticians  are  setting  standards  too  high. 

5*  More  careful  diagnosis  of  the  purposes  for  which  each  of  the 
three  different  types  of  reliability  coefficients  are  best  suited* 


.v  . J. 


. 1 ■ ■ ■ 

. 

. 

« 


BIBILIOGRAPHY 


Anastasi,  A.,  ’’Influence  of  Practice  Upon  Test  Reliability,"  Journal  of 
Educational  Psychology , Vol.  25,  May  1945,  pp.  521“525* 

Barr,  A.  S.,  "The  Coefficient  of  Correlation,"  Journal  of  Educational 
Research,  1951,  Vol.  25,  pp,  55"60. 

Conrad,  H,  S,  and  Martin,  G.  B.,  "The  Index  of  Forecasting  Efficiency 
for  the  Case  of  a 'True1  criterion,"  Journal  of  Experimental 
Education,  1955>  Vol,  4,  pp,  251-244. 

Cronbach,  Lee  J. , "Test  'Reliability1  Its  meaning  and  Determination, 
Psychometrika , Vol.  XII,  March  1947,  pp.  1-16. 

Darley,  J.  G, , Testing  and  Counseling  in  the  High  School  Gudiance 
Program,  Science  Reasearch  Associates,  Chicago,  1945. 

Dunlap,  Jack  W. , "Comparable  Test  and  Reliability,"  Journal  of  Ed- 
ucational Psychology,  Vol.  XXIV,  September  1955 > pp.  442-455* 

Franzen,  R.  H.  and  Derryberry,  M.,  "Note  on  Reliability  Coefficient," 
Journal  of  Educational  Psychology,  Vol.  25,  1952 , pp.  559-560* 

Garrett,  Henry  E.,  Statistics  in  Psychology,  Long  man,  Green  Sc  Co., 

New  York,  1958. 

Goodenough,  Florence  L. , "Note  on  an  Unnecessary  Source  of  Confusion 
in  Statistical  Terminology,"  Journal  of  Educational  Psychology, 
Vol,  XXXVIII,  November  194?,  pp.  442-445. 

Green,  H.  A.,  Jorgensen,  A.  N.  and  Gerverich,  J.  R.,  Measurement  and 
Evaluation  in, the  Elementary  School,  Longman,  Green  Sc  Co., 

New  York,  1945. 

Gulliksen,  Harold,  "The  Relation  of  Item  Difficulty  and  Inter-Item 
Correlation  to  Test  Variance  and  Reliability,"  Psychometrika , 

Vol.  X,  June,  1945,  pp.  79-91. 

Gutman,  Louis,  "The  Test-Retest  Reliability  of  Qualitative  Data," 
Psychometrika,  Vol.  XI,  June  1946,  pp.  81-95* 

Jackson,  R.  W.  B.  and  Ferguson,  G.  A.,  Studies  in  Reliability  of  Tests , 
University  of  Toronto,  Toronto,  1941.. 

Jordan,  R.  C. , "An  Empirical  Study  of  the  Reliability  Coefficient" 
Journal  of  Educational  Psychology,  Vol.  26,  1955 » PP-  416-426. 


Kelley,  T.  L.,  Interpretation  of  Education  Measurements , World  Book  Co., 
Yonkers -on-Hud son,  New  York,  1927. 

, Fundamentals  of  Statistics.  Harvard  University  Press, 

Cambridge , 1947. 

and  Krey,  A.  C. , Test  and  Measurements  in  the  Social 

Science-Charles  Scribner's  Sons,  Boston,  195^,  pp.  298-210. 

Lindquist,  E.  F.,  A First  Course  in  Statistics  Houghton  Mifflin  Co., 
Boston,  195® » pp.  205-255* 

Madsen,  I.  N. , Educational  Measurements  in  the  Elementary  Grades. 

World  Book  Co.,  New  York.,  1950,  Vol.  51-69,  pp. 15-25. 

McCall,  Wm.  A.,  Measurement . The  Mac  Millan  Co.,  1959,  pp.  55**60. 

Monroe,  W.  S.,  An  Introduction  to  the  Theory  of  Educational  Measurement. 
1925. 

Remmer,  H.  H.  and  Gage,  N.  L. , Educational  Measurement  and  Education. 
Harper  & Brow.,  New  York,  19^5. 

, Shock,  N.  W.  and  Kelley,  T.  L.,  "An  Empirical  Study  of 

the  Validity  of  the  Spearman-Brown  Formula  as  Applied  to  the 
Purdue  Rating  Scale,”  Journal  of  Experimental  Education  Psohology. 

Vol.  18,  1927,  pp.  187-195* 

and  Whistler,  Laurence,  "Test  Reliability  as  a Function 

of  the  Method  of  Computation" , Journal  of  Educational  Psychology. 
Vol.  29,  1958, PP.  81-92. 

Richardson,  M.  W.  and  Kuder,  G.  F.,  "The  Calculation  of  Test  Reliability 
Coefficients  Based  on  the  Method  of  Rational  Equivalence," 

Journal  of  Educational  Psychology,  Vol.  50  > 1959,  pp.  681-687. 

Ruch,  G.  M.,  "Minimum  Essentials  in  Reporting  Data  on  Standard  Tests," 
Journal  of  Education  Research.  Vol.  XII,  1925,  PP*  5^9-558. 

Slocombe,  C.  S.,  "Truman  L.  Kelley  Measures  Mental  Traits,"  Journal  of 
Educational  Psychology,  Vol.  19,  1928,  pp.  597-5°l. 

Symonds,  P.  M. , "Factors  Influencing  Test  Reliability,"  Journal  of 
Educational  Psychology,  Vol.  19,  1928,  pp.  71-87. 

Thorndike,  R.  L. , "Logical  Dilemmas  in  the  Estimation  of  Reliability," 
National  Projects  in  SDucation  Measurement,  19^8. 


, Research  Problem  and  Techniques , Report  No.  Army- 

Air  Force  - Aviation  Psychology,  194-7. 

Tiegs,  E.  W. , and  Crawford,  C.  C.,  Statistics  for  Teachers.  Houghton 
Mifflin  Co.,  Boston,  1950. 

Walker,  H.  M.  Studies  in  the  History  of  Statistical  Method.  The 
William  Wilkin  Co.,  Baltimore,  1929. 

, Elementary  Statistical  Methods.  Henry  Holt  & Co., 

New  York,  194-3  • 


INDEX  OF  TESTS 


ARITHMETIC 

The  Clapp-Young  Self -Marking  Test,  Form  A & B, 
Grades  5-8  > Concrete  Problems,  Houghton 
Mifflin  Co.,  Boston,  1929. 

Los  Angeles  Diagnostic,  Grades  5*5-6.7> 

California  Test  Bureau,  Los  Angeles, 
California,  1928. 

Woody-McCall  Fundamentals,  Form  I,  II,  III,  & 

IV,  Grades  5-8,  Teachers  College, 

Columbia  University,  New  York,  1920. 

ART 

Tests  in  Fundamental  Abilities  of  Visual  Art- 
Lewerenz,  Grades  5-12,  California  Test 
Bureau,  Los  Angeles,  California,  1927. 

BATTERIES 

Iowa  Every  Pupil  Test  of  Basic  Skills,  Houghton 
Mifflins  Co.,  Elementary  Battery,  1945. 

Metropolitian  Achievement  Intermediate  Battery, 
Forms  R,  S,  T,  Y,  & V,  Grades  5>  6,  and 
beginning  Jf  World  Book  Co.,  New  York,  194-6. 

Progressive  Achievement  Test,  Tiegs  & Clark, 
Elementary  Battery,  Form  A,  Grades  4-6, 
California  Test  3ureau,  Los  Angeles, 
California,  1945. 

Stanford  Achievement  Test,  Kelley,  Ruch,  Terman, 
Intermediate  Battery,  Grades  5-8,  World 
Book  Co.,  New  York,  1940. 

Unit  Scales  of  Attainment-Brueckner , Anderson,  & 
Van  Wagenen,  Education  Test  Bureau,  Grades 
4-6,  Philadelphia,  Pennsylvania,  1955-59. 

HEALTH 

Gates-Strang  Health  Knowledge  Test,  Grades  5-8, 
Form  C,  Teachers  College,  Columbia 
University,  New  York  City,  1957. 

HOME  ECONOMICS 

Home  Economics  Test-Engle-Stenquist , Foods  & 

Cookery,  Clothing  & Textiles,  Grades  5-10, 
Forms  A & B,  World  Book  Co.,  New  York,  1951* 

LANGUAGE 

The  Clapp-Young  English  Test,  Form  A,  Grades  5”12, 
Houghton  Mifflin  Co.,  Boston,  1929* 


Iowa  Language  Abilities  Test,  Elementary  forms- 
Green  & Ballenger,  World  Book  Go.,  New 
York,  1945. 

Leonard  Diagnostic  Test  in  Punctuation  and 

Capitalization,  Grades  5-H,  World  Book  Co,, 
New  York,  1951* 

Language  Essentials  Tests-Schrammel  & Davis, 

Grades  4-8,  Educational  Test  Bureau, 
Philadelphia,  Pennsylvania,  1941, 

Los  Angeles  Diagnostic  Test-Armstrong,  Language 
Usage,  Form  A,  Grades  5-9,  California  Test 
Bureau,  Los  Angeles,  California,  1927* 

MENTAL  ABILITY 

California  Test  of  Mental  Maturity -Sullivan, 

Tiega,  & Clark,  Elementary  Series,  Grades 
4-8,  California  Test  Bureau,  Los  Angeles, 
California,  1946  Revision, 

Haggerty  Intelligence  Examination,  Delta  2, 

Grades  5-9,  World  Book  Co,,  New  York,  1920, 

The  Henmon-Nelson  Tests  of  Mental  Ability, 

Forms  A,  B,  C,  Grades  5-8,  Houghton  Mifflin 
Co,,  Boston,  1946. 

Kuhlmann-Anderson  Intelligence  Tests,  Educational 
Test  Bureau,  Philadelphia,  Pennsylvania, 

1927,  '40,  *42,  '47. 

Multi-Mental  Scale-McCall , Grades  5-12,  Bureau  of 
Publications,  Teachers  College,  Columbia 
University,  New  York,  1950* 

National  Intelligence  Tests-Haggerty , Terrnan, 

Thorndike,  Whipple,  & Yerkes,  Scale  A,  Form 
1,  Grades  5-8,  World  Book  Co,,  New  York, 

1920. 

Otis  Quick-Scoring  Tests  of  Mental  Ability, 

Intermediate,  Grades  4-9,  World  Book  Co,, 

New  York,  1957-59. 

Pintner  General  Abilities  Test;  Non-Language 
Series,  Intermediate  Test,  Grade  4-9 
World  Book  Co,,  New  York,  1945. 

READING 

Chapman-Cook  Speed  of  Reading  Test,  Forms  A 
& B,  Grades  4-8,  Educational  Test  Bureau 
Philadelphia,  Pennsylvania,  1924, 

Diagnostic  Examination  of  Silent  Reading  Abilities, 
Van  Wagenen  & Dovak,  Intermediate,  Grades 
4-5  and  Junior,  Grades  6-8,  Form  M,  Ed- 
ucational Test  Bureau,  Philadelphia, 
Pennsylvania,  1959, 


Durrell  and  Sullivan-Reading  Capacity  and 

Achievement  Tests  Intermediate,  Form  A, 
Grades  5-6 , World  Book  Co*  New  York,  194% 

Gates  Basic  Reading  Test,  Form  1,  Grades  2-8, 
Teachers  College,  Columbia  University, 

New  York,  1942. 

Haggerty  Reading  Examination,  Sigma  3>  Form 

A,  Grades  6-12,  World  Book  Co.,  New  York, 
1920. 

Ingraham-Clark  Diagnostic  Reading  Tests  Inter- 
mediate, Form  1&2,  PtI&II,  Grades  4-8 
California  Test  Bureau,  Los  Angeles, 
California,  1929. 

Iowa  Silent  Reading  Tests , Elementary  Forms- 

Greene,  Jorgensen,  & Kelley,  World  Book  Co 
New  York,  194% 

The  Nelson  Silent  Reading  Test-Clapp,  Young,  & 
Nelson,  Forms  A,  B,  & C,  Houghton  Mifflin 
Co.,  Boston,  1929. 

Sangren-Woody  Reading  Test,  Form  A,  Grades  4-8, 
World  Book  Co.,  New  York,  1927. 

Sentence  Vocabulary  Test-Armstrong  & Danielson, 
California  Test  Bureau,  Los  Angeles, 
California,  1926. 

SOCIAL  STUDIES 

Burton  Civics  Test,  Form  B,  Grades  5“9,  World 
Book  Co.,  New  York,  1928. 

Wiedefeld-Walther  Geography  Test,  Form  B, 

Grades  4-8,  World  Book  Co.,  New  York,  1931 


